9 research outputs found

    Statistical Model Evaluation Using Reproducing Kernels and Stein’s method

    Get PDF
    Advances in computing have enabled us to develop increasingly complex statistical models. However, their complexity poses challenges in their evaluation. The central theme of the thesis is addressing intractability and interpretability in model evaluations. The key tools considered in the thesis are kernel and Stein's methods: Kernel methods provide flexible means of specifying features for comparing models, and Stein's method further allows us to incorporate model structures in evaluation. The first part of the thesis addresses the question of intractability. The focus is on latent variable models, a large class of models used in practice, including factor models, topic models for text, and hidden Markov models. The kernel Stein discrepancy (KSD), a kernel-based discrepancy, is extended to deal with this model class. Based on this extension, a statistical hypothesis test of relative goodness of fit is developed, enabling us to compare competing latent variable models that are known up to normalization. The second part of the thesis concerns the question of interpretability with two contributed works. First, interpretable relative goodness-of-fit tests are developed using kernel-based discrepancies developed in Chwialkowski et al. (2015); Jitkrittum et al. (2016); Jitkrittum et al. (2017). These tests allow the user to choose features for comparison and discover aspects distinguishing two models. Second, a convergence property of the KSD is established. Specifically, the KSD is shown to control an integral probability metric defined by a class of polynomially growing continuous functions. In particular, this development allows us to evaluate both unnormalized statistical models and sample approximations to posterior distributions in terms of moments

    A Kernel Stein Test of Goodness of Fit for Sequential Models

    Get PDF
    We propose a goodness-of-fit measure for probability densities modeling observations with varying dimensionality, such as text documents of differing lengths or variable-length sequences. The proposed measure is an instance of the kernel Stein discrepancy (KSD), which has been used to construct goodness-of-fit tests for unnormalized densities. The KSD is defined by its Stein operator: current operators used in testing apply to fixed-dimensional spaces. As our main contribution, we extend the KSD to the variabledimension setting by identifying appropriate Stein operators, and propose a novel KSD goodness-offit test. As with the previous variants, the proposed KSD does not require the density to be normalized, allowing the evaluation of a large class of models. Our test is shown to perform well in practice on discrete sequential data benchmarks

    Deep Proxy Causal Learning and its Application to Confounded Bandit Policy Evaluation

    Get PDF
    Proxy causal learning (PCL) is a method for estimating the causal effect of treatments on outcomes in the presence of unobserved confounding, using proxies (structured side information) for the confounder. This is achieved via two-stage regression: in the first stage, we model relations among the treatment and proxies; in the second stage, we use this model to learn the effect of treatment on the outcome, given the context provided by the proxies. PCL guarantees recovery of the true causal effect, subject to identifiability conditions. We propose a novel method for PCL, the deep feature proxy variable method (DFPV), to address the case where the proxies, treatments, and outcomes are high-dimensional and have nonlinear complex relationships, as represented by deep neural network features. We show that DFPV outperforms recent state-of-the-art PCL methods on challenging synthetic benchmarks, including settings involving high dimensional image data. Furthermore, we show that PCL can be applied to off-policy evaluation for the confounded bandit problem, in which DFPV also exhibits competitive performance

    A kernel Stein test of goodness of fit for sequential models

    Full text link
    We propose a goodness-of-fit measure for probability densities modeling observations with varying dimensionality, such as text documents of differing lengths or variable-length sequences. The proposed measure is an instance of the kernel Stein discrepancy (KSD), which has been used to construct goodness-of-fit tests for unnormalized densities. The KSD is defined by its Stein operator: current operators used in testing apply to fixed-dimensional spaces. As our main contribution, we extend the KSD to the variable-dimension setting by identifying appropriate Stein operators, and propose a novel KSD goodness-of-fit test. As with the previous variants, the proposed KSD does not require the density to be normalized, allowing the evaluation of a large class of models. Our test is shown to perform well in practice on discrete sequential data benchmarks.Comment: 18 pages. Accepted to ICML 202

    Testing Goodness of Fit of Conditional Density Models with Kernels

    Get PDF
    We propose two nonparametric statistical tests of goodness of fit for conditional distributions: given a conditional probability density function p(y∣x)p(y|x) and a joint sample, decide whether the sample is drawn from p(y∣x)rx(x)p(y|x)r_x(x) for some density rxr_x. Our tests, formulated with a Stein operator, can be applied to any differentiable conditional density model, and require no knowledge of the normalizing constant. We show that 1) our tests are consistent against any fixed alternative conditional model; 2) the statistics can be estimated easily, requiring no density estimation as an intermediate step; and 3) our second test offers an interpretable test result providing insight on where the conditional model does not fit well in the domain of the covariate. We demonstrate the interpretability of our test on a task of modeling the distribution of New York City's taxi drop-off location given a pick-up point. To our knowledge, our work is the first to propose such conditional goodness-of-fit tests that simultaneously have all these desirable properties.Comment: In UAI 2020. http://auai.org/uai2020/accepted.ph

    Informative Features for Model Comparison

    Get PDF
    Given two candidate models, and a set of target observations, we address the problem of measuring the relative goodness of fit of the two models. We propose two new statistical tests which are nonparametric, computationally efficient (runtime complexity is linear in the sample size), and interpretable. As a unique advantage, our tests can produce a set of examples (informative features) indicating the regions in the data domain where one model fits significantly better than the other. In a real-world problem of comparing GAN models, the test power of our new test matches that of the state-of-the-art test of relative goodness of fit, while being one order of magnitude faster.Comment: Accepted to NIPS 201

    A Kernel Stein Test for Comparing Latent Variable Models

    Get PDF
    We propose a kernel-based nonparametric test of relative goodness of fit, where the goal is to compare two models, both of which may have unobserved latent variables, such that the marginal distribution of the observed variables is intractable. The proposed test generalises the recently proposed kernel Stein discrepancy (KSD) tests (Liu et al., 2016, Chwialkowski et al., 2016, Yang et al., 2018) to the case of latent variable models, a much more general class than the fully observed models treated previously. As our main theoretical contribution, we prove that the new test, with a properly calibrated threshold, has a well-controlled type-I error. In the case of models with low-dimensional latent structure and high-dimensional observations, our test significantly outperforms the relative Maximum Mean Discrepancy test, which cannot exploit the latent structure.Comment: update test statistic (MCMC version

    Deep Proxy Causal Learning and its Application to Confounded Bandit Policy Evaluation

    Full text link
    Proxy causal learning (PCL) is a method for estimating the causal effect of treatments on outcomes in the presence of unobserved confounding, using proxies (structured side information) for the confounder. This is achieved via two-stage regression: in the first stage, we model relations among the treatment and proxies; in the second stage, we use this model to learn the effect of treatment on the outcome, given the context provided by the proxies. PCL guarantees recovery of the true causal effect, subject to identifiability conditions. We propose a novel method for PCL, the deep feature proxy variable method (DFPV), to address the case where the proxies, treatments, and outcomes are high-dimensional and have nonlinear complex relationships, as represented by deep neural network features. We show that DFPV outperforms recent state-of-the-art PCL methods on challenging synthetic benchmarks, including settings involving high dimensional image data. Furthermore, we show that PCL can be applied to off-policy evaluation for the confounded bandit problem, in which DFPV also exhibits competitive performance
    corecore